assembly instruction
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
Shape assembly is a ubiquitous task in daily life, integral for constructing complex 3D structures like IKEA furniture. While significant progress has been made in developing autonomous agents for shape assembly, existing datasets have not yet tackled the 4D grounding of assembly instructions in videos, essential for a holistic understanding of assembly in 3D space over time. We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense spatio-temporal alignments between these data modalities. To demonstrate the utility of IKEA Video Manuals, we present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, part-conditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals. For each application, we provide evaluation metrics and baseline methods. Through experiments on our annotated data, we highlight many challenges in grounding assembly instructions in videos to improve shape assembly, including handling occlusions, varying viewpoints, and extended assembly sequences.
AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly
Kyaw, Alexander Htet, Ma, Haotian, Zivkovic, Sasa, Sabin, Jenny
We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions. For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed. By connecting assembly instructions with the real-time location of relevant components, the system eliminates the need for manual searching, sorting, or labeling of different components before each assembly. To demonstrate the feasibility of using object recognition for AR-assisted assembly, we highlight a case study involving the assembly of LEGO sculptures.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.18)
- North America > United States > New York > Tompkins County > Ithaca (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (2 more...)
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
Shape assembly is a ubiquitous task in daily life, integral for constructing complex 3D structures like IKEA furniture. While significant progress has been made in developing autonomous agents for shape assembly, existing datasets have not yet tackled the 4D grounding of assembly instructions in videos, essential for a holistic understanding of assembly in 3D space over time. We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense spatio-temporal alignments between these data modalities. To demonstrate the utility of IKEA Video Manuals, we present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, part-conditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals. For each application, we provide evaluation metrics and baseline methods.
- Retail (1.00)
- Education > Educational Technology (0.63)
Beyond Trusting Trust: Multi-Model Validation for Robust Code Generation
UMBC CODEBOT '25 Workshop Columbia, MD / 25-26 February 2025BEYOND TRUSTING TRUST: MUL TI-MODEL V ALIDA TION FOR ROBUST CODE GENERA TION Bradley McDanel Franklin and Marshall College bmcdanel@fandm.edu 1 Introduction Ken Thompson's 1984 essay "Reflections on Trusting Trust" demonstrated that even carefully reviewed source code could hide malicious behavior through compromised compilers - because the malicious code exists only in the compiled binary form, not its source [1]. Today, large language models (LLMs) used as code generators [2, 3] present an even more opaque security challenge than classical compilers. While compiler binaries can be analyzed for malicious behavior, LLMs operate through vast matrices of weights combined in non-linear ways, making it difficult to develop robust methods for identifying embedded behaviors [4, 5]. This paper revisits Thompson's analogy in the context of LLM-based code generation. We show how malicious behavior might be subtly embedded into a widely used model and argue that direct inspection of the model's parameters is currently infeasible.
Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
Image-guided object assembly represents a burgeoning research topic in computer vision. This paper introduces a novel task: translating multi-view images of a structural 3D model (for example, one constructed with building blocks drawn from a 3D-object library) into a detailed sequence of assembly instructions executable by a robotic arm. Fed with multi-view images of the target 3D model for replication, the model designed for this task must address several sub-tasks, including recognizing individual components used in constructing the 3D model, estimating the geometric pose of each component, and deducing a feasible assembly order adhering to physical rules. Establishing accurate 2D-3D correspondence between multi-view images and 3D objects is technically challenging. To tackle this, we propose an end-to-end model known as the Neural Assembler. This model learns an object graph where each vertex represents recognized components from the images, and the edges specify the topology of the 3D model, enabling the derivation of an assembly plan. We establish benchmarks for this task and conduct comprehensive empirical evaluations of Neural Assembler and alternative solutions. Our experiments clearly demonstrate the superiority of Neural Assembler.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Learning to Reverse DNNs from AI Programs Automatically
Chen, Simin, Khanpour, Hamed, Liu, Cong, Yang, Wei
With the privatization deployment of DNNs on edge devices, the security of on-device DNNs has raised significant concern. To quantify the model leakage risk of on-device DNNs automatically, we propose NNReverse, the first learning-based method which can reverse DNNs from AI programs without domain knowledge. NNReverse trains a representation model to represent the semantics of binary code for DNN layers. By searching the most similar function in our database, NNReverse infers the layer type of a given function's binary code. To represent assembly instructions semantics precisely, NNReverse proposes a more fine-grained embedding model to represent the textual and structural-semantic of assembly functions.
- North America > United States > Texas (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer
Artuso, Fiorella, Mormando, Marco, Di Luna, Giuseppe A., Querzoni, Leonardo
A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model. In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.
- Education (1.00)
- Information Technology > Security & Privacy (0.93)
Function Naming in Stripped Binaries Using Neural Networks
Artuso, Fiorella, Di Luna, Giuseppe Antonio, Massarelli, Luca, Querzoni, Leonardo
Abstract--In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to portion of code the string of words that wou ld be likely assigned by an human reverse engineer . We formally and precisely define the framework in which our investigatio n takes place. That is we define problem, we provide reasonable justifications for the choice that we made during our designi ng of the training and test steps and we performed a statistical an alysis of function names in a large real-world corpora of over 4 mill ions of functions. In such framework we test several baselines co ming from the field of NLP (e.g., Seq2Seq networks and transformer s). Moreover, we provide a set of tailored solutions that beat th e aforementioned baselines. Last few years have witnessed the growth of a trend consisting in the application of machine learning (ML) and natural language processing (NLP) techniques to the code, as illustrated in [14].
Sequence Intent Classification Using Hierarchical Attention Networks - Developer Blog
In this code story, we will discuss applications of Hierarchical Attention Neural Networks for sequence classification. In particular, we will use our work the domain of malware detection and classification as a sample application. Malware, or malicious software, refers to harmful computer programs such as viruses, ransomware, spyware, adware, and others that are usually unintentionally installed and executed. When detecting malware in a running process, a typical sequence to analyze could be a set of disk access actions that the program has taken. To analyze software without running it, we can treat series of assembly instructions in the disassembled binary as sequences to be classified in order to identify sections of the code with malicious intent.
Scientists create a robot that can put together an Ikea chair
It is the cause of countless marital rows over mislaid allen keys and baffling instructions. But while trying to build Ikea furniture can ruin a weekend for many couples, the robots at least have got it sussed. Robots can build an Ikea chair in under nine minutes, mechanical engineers have discovered, after being programmed to fit the parts together perfectly. People, according to Ikea, take 10 to 15 minutes on average to build the same item of furniture. Robots can build an Ikea chair in under nine minutes, mechanical engineers have discovered, after being programmed to fit the parts together perfectly.
- Retail (1.00)
- Government > Regional Government > Europe Government (0.34)